Calculations/Tables

##Loading Required Libraries
###Downloading Dataset
tibble [9,994 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Ship Mode   : chr [1:9994] "Second Class" "Second Class" "Second Class" "Standard Class" ...
 $ Segment     : chr [1:9994] "Consumer" "Consumer" "Corporate" "Consumer" ...
 $ Country     : chr [1:9994] "United States" "United States" "United States" "United States" ...
 $ City        : chr [1:9994] "Henderson" "Henderson" "Los Angeles" "Fort Lauderdale" ...
 $ State       : chr [1:9994] "Kentucky" "Kentucky" "California" "Florida" ...
 $ Postal Code : num [1:9994] 42420 42420 90036 33311 33311 ...
 $ Region      : chr [1:9994] "South" "South" "West" "South" ...
 $ Category    : chr [1:9994] "Furniture" "Furniture" "Office Supplies" "Furniture" ...
 $ Sub-Category: chr [1:9994] "Bookcases" "Chairs" "Labels" "Tables" ...
 $ Sales       : num [1:9994] 262 731.9 14.6 957.6 22.4 ...
 $ Quantity    : num [1:9994] 2 3 2 5 2 7 4 6 3 5 ...
 $ Discount    : num [1:9994] 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
 $ Profit      : num [1:9994] 41.91 219.58 6.87 -383.03 2.52 ...
 - attr(*, "spec")=
  .. cols(
  ..   `Ship Mode` = col_character(),
  ..   Segment = col_character(),
  ..   Country = col_character(),
  ..   City = col_character(),
  ..   State = col_character(),
  ..   `Postal Code` = col_double(),
  ..   Region = col_character(),
  ..   Category = col_character(),
  ..   `Sub-Category` = col_character(),
  ..   Sales = col_double(),
  ..   Quantity = col_double(),
  ..   Discount = col_double(),
  ..   Profit = col_double()
  .. )

Exploratory Data Analysis

Data Description Data contains Sales records from a Super-Store located in USA Ship Mode- The mode in which the product was shipped. 4 Possible Modes “First Class” “Same Day” “Second Class” “Standard Class”

*Segment- The Sale segment +3 Possible Segments +“Consumer” “Corporate” “Home Office”

*Country- Country in which the sale was made. (United States)

*City- The city in which the sale was made .

*State- State in which the sale was made.

*Postal Code- Postal Code for the region of the sale.

Region - The Region in which the sale was made. +4 possible values + “South” “West” “Central” “East” Category- The category of product which was purchased. +3 possible values +“Furniture” “Office Supplies” “Technology” *Sub Category- A more in-depth category of update +17 possible values

*Sales- Amount of money transaction for the Sale

*Quantity- Quantity of products involved in the sale

*Discount- Discount applied on the sale

*Profit- Profit made from the sale

Describe the data
# A tibble: 4 x 26
  variable     n    na    mean      sd se_mean   IQR skewness kurtosis      p00
  <chr>    <int> <int>   <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>    <dbl>
1 Discount  9994     0   0.156   0.206 0.00207   0.2     1.68     2.41  0.     
2 Quantity  9994     0   3.79    2.23  0.0223    3       1.28     1.99  1.00e+0
3 Profit    9994     0  28.7   234.    2.34     27.6     7.56   397.   -6.60e+3
4 Sales     9994     0 230.    623.    6.23    193.     13.0    305.    4.44e-1
# ... with 16 more variables: p01 <dbl>, p05 <dbl>, p10 <dbl>, p20 <dbl>,
#   p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
#   p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>
Overview of the data
# A tibble: 1 x 2
      Sale  Profit
     <dbl>   <dbl>
1 2297201. 286397.
Number of Sales
# A tibble: 6 x 3
  State            N profit
  <chr>        <int>  <dbl>
1 California    2001   38.2
2 New York      1128   65.6
3 Texas          985  -26.1
4 Pennsylvania   587  -26.5
5 Washington     506   66.0
6 Illinois       492  -25.6
[1] 9994
Breakdown of Number of Sales State-Wise sale % contribution
# A tibble: 49 x 2
   State          PerCent
   <chr>            <dbl>
 1 California       20.0 
 2 New York         11.3 
 3 Texas             9.86
 4 Pennsylvania      5.87
 5 Washington        5.06
 6 Illinois          4.92
 7 Ohio              4.69
 8 Florida           3.83
 9 Michigan          2.55
10 North Carolina    2.49
# ... with 39 more rows
City-Wise sale % Contribution
# A tibble: 531 x 2
   City          PerCent
   <chr>           <dbl>
 1 New York City    9.16
 2 Los Angeles      7.47
 3 Philadelphia     5.37
 4 San Francisco    5.10
 5 Seattle          4.28
 6 Houston          3.77
 7 Chicago          3.14
 8 Columbus         2.22
 9 San Diego        1.70
10 Springfield      1.63
# ... with 521 more rows
Profit Breakdown
# A tibble: 49 x 3
   State      Total_Profit    SD
   <chr>             <dbl> <dbl>
 1 California       76381.  97.7
 2 New York         74039. 233. 
 3 Washington       33403. 325. 
 4 Michigan         24463. 375. 
 5 Virginia         18598. 235. 
 6 Indiana          18383. 694. 
 7 Georgia          16250. 283. 
 8 Kentucky         11200. 171. 
 9 Minnesota        10823. 506. 
10 Delaware          9977. 519. 
# ... with 39 more rows
Avg-Discounts (State Wise)
# A tibble: 6 x 2
  State          Avg
  <chr>        <dbl>
1 Illinois     0.390
2 Texas        0.370
3 Pennsylvania 0.329
4 Ohio         0.325
5 Colorado     0.316
6 Arizona      0.304
# A tibble: 6 x 2
  State          Total
  <chr>          <dbl>
1 Arizona       35282.
2 Colorado      32108.
3 Illinois      80166.
4 Ohio          78258.
5 Pennsylvania 116512.
6 Texas        170188.
Which Category of Sales brings the maximum sales and profits?
# A tibble: 3 x 3
  Category        Total_Sale Total_Profit
  <chr>                <dbl>        <dbl>
1 Furniture          742000.       18451.
2 Office Supplies    719047.      122491.
3 Technology         836154.      145455.
# A tibble: 46 x 2
   State              N
   <chr>          <int>
 1 California       359
 2 New York         208
 3 Texas            179
 4 Pennsylvania     119
 5 Ohio             102
 6 Washington       100
 7 Illinois          84
 8 Florida           69
 9 North Carolina    51
10 Arizona           48
# ... with 36 more rows
Number of Sales for each Category
# A tibble: 3 x 2
  Category        `n()`
  <chr>           <int>
1 Furniture        2121
2 Office Supplies  6026
3 Technology       1847

In more Detail

# A tibble: 142 x 3
# Groups:   State [49]
   State        Category            N
   <chr>        <chr>           <int>
 1 California   Office Supplies  1198
 2 New York     Office Supplies   684
 3 Texas        Office Supplies   604
 4 California   Furniture         444
 5 California   Technology        359
 6 Pennsylvania Office Supplies   343
 7 Washington   Office Supplies   292
 8 Illinois     Office Supplies   285
 9 Ohio         Office Supplies   274
10 New York     Furniture         236
# ... with 132 more rows
Check what % of sales were discounted for big sale spots (States)
# A tibble: 20 x 5
   State              T_S   T_D     T_P Percent
   <chr>            <dbl> <dbl>   <dbl>   <dbl>
 1 California     457688. 146.   76381. 0.0318 
 2 New York       310876.  62.4  74039. 0.0201 
 3 Washington     138641.  32.4  33403. 0.0234 
 4 Michigan        76270.   1.8  24463. 0.00236
 5 Virginia        70637.   0    18598. 0      
 6 Indiana         53555.   0    18383. 0      
 7 Georgia         49096.   0    16250. 0      
 8 Kentucky        36592.   0    11200. 0      
 9 Minnesota       29863.   0    10823. 0      
10 New Jersey      35764.   0.6   9773. 0.00168
11 Wisconsin       32115.   0     8402. 0      
12 Florida         89474. 115.   -3399. 0.128  
13 Arizona         35282.  68    -3428. 0.193  
14 Tennessee       30662.  53.3  -5342. 0.174  
15 Colorado        32108.  57.6  -6528. 0.179  
16 North Carolina  55603.  70.6  -7491. 0.127  
17 Illinois        80166. 192.  -12608. 0.239  
18 Pennsylvania   116512. 193.  -15560. 0.166  
19 Ohio            78258. 152.  -16971. 0.195  
20 Texas          170188. 365.  -25729. 0.214  

Visual Representations

Row

Tree-Map

Row

Biggest Sale Spots

Row

Word Cloud- Sub Categories

Basic Visuals-1

Row

Number of Sales(State-Wise)

Row

Category Division

Row

Number of Sales Region Wise

Basic Visuals-2

Number of sales category-wise

Conclusion

Conclusion We see the following Trends in the data: States with higher sales have lower Percentage of Discounts States with higher Discount Rates are actually losing the Store money

Suggestion: Increase Prices in States with Discount % more than 0.1 Most States with high sales don’t have any discounts applied, following this the top 10 sale states should have 0 Discounts to bring in more profit.

Our most valuable Category is Office supplies in the high Sales states, thus we should increase stock and prices of this category in the same states

California and New York are the two most important locations for this store, it would be advised to focus their attention and expand in these two location as these have the most promise.

---
title: "Task_5 Dashboard"
output: 
  flexdashboard::flex_dashboard:
    vertical_layout: scroll
    source_code: embed
    theme: cerulean
---

Calculations/Tables 
=====================================  


```{r setup, include=FALSE}
library(flexdashboard)
```


##Loading Required Libraries
```{r}
library(gsheet)#Access data from docs
library(dlookr)
library(dplyr)
library(ggplot2)
library(forcats)
library(magrittr)
library(treemap)
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)
library(leaflet)
library(widgetframe)
library(doBy)
library(viridisLite)
```

###Downloading Dataset
```{r}
superstore <- gsheet2tbl('https://docs.google.com/spreadsheets/d/1ELHYd5wKnLdJE3tuszI-Ced_hyye5CnOnArshR_UGPI/edit?usp=sharing') 
str(superstore)
```

**Exploratory Data Analysis**

*Data Description*
Data contains Sales records from a Super-Store located in USA
Ship Mode- The mode in which the product was shipped.
            4 Possible Modes
                "First Class" "Same Day"  "Second Class" "Standard Class"
            
*Segment- The Sale segment
          +3 Possible Segments
            +"Consumer"    "Corporate"   "Home Office"

*Country- Country in which the sale was made. (United States)

*City- The city in which the sale was made .

*State- State in which the sale was made.

*Postal Code- Postal Code for the region of the sale.

*Region - The Region in which the sale was made.
          +4 possible values
           + "South"   "West"    "Central" "East"
*Category- The category of product which was purchased.
            +3 possible values
              +"Furniture" "Office Supplies" "Technology"
*Sub Category- A more in-depth category of update
                +17 possible values
                
*Sales- Amount of money transaction for the Sale

*Quantity- Quantity of products involved in the sale

*Discount- Discount applied on the sale

*Profit- Profit made from the sale


**Describe the data** 
```{r}
describe(superstore,Discount,Quantity,Profit,Sales)

```

**Overview of the data**
```{r}
Overview<-superstore %>% summarise(Sale=sum(Sales),Profit=sum(Profit))
Overview

```

**Number of Sales**
```{r}
#Get the number of sales made in each state to find useful information 
No_sales<-superstore %>% group_by(State) %>% summarise(N=n(),profit=mean(Profit)) %>% arrange(desc(N)) %>% head()
No_sales

#Total Sales made:
nrow(superstore)

```



**Breakdown of Number of Sales**
*State-Wise sale % contribution*
```{r}
#State-Wise sale % contribution
superstore %>% group_by(State) %>% summarise(PerCent=n()*100/nrow(superstore)) %>% arrange(desc(PerCent))
```

*City-Wise sale % Contribution*
```{r}
#City-Wise sale % Contribution
superstore %>% group_by(City) %>% summarise(PerCent=n()*100/nrow(superstore)) %>% arrange(desc(PerCent))
```


**Profit Breakdown**
```{r}
#Find the states that are most profitable
superstore %>% group_by(State) %>% summarise(Total_Profit=sum(Profit),SD=sd(Profit)) %>% arrange(desc(Total_Profit))

```


**Avg-Discounts (State Wise)**
```{r}

dis<-superstore %>% group_by(State) %>% summarise(Avg=mean(Discount)) %>% arrange(desc(Avg)) %>% head()
dis

superstore %>% group_by(State) %>% filter(State%in%dis$State) %>% summarise(Total=sum(Sales))

sales<-superstore %>% group_by(State) %>% summarise(Total=sum(Sales)) %>% arrange(desc(Total))

#Find the states which bring the most amount of money in sales
valued_states<-sales$State[1:20]

```



**Which Category of Sales brings the maximum sales and profits? **
```{r}
superstore %>% group_by(Category) %>% summarise(Total_Sale=sum(Sales),Total_Profit=sum(Profit))

superstore %>% group_by(State) %>% filter(Category=="Technology") %>% summarise(N=n()) %>% arrange(desc(N))
#We see California pops up again as the biggest customer 

```


**Number of Sales for each Category**
```{r}
superstore %>% group_by(Category) %>% summarise(n())

```

**In more Detail**

```{r}
Sold_Category<-superstore %>% group_by(State,Category) %>% summarise(N=n()) %>% arrange(desc(N))
Sold_Category

```

**Check what % of sales were discounted for big sale spots (States)**
```{r}

superstore %>% group_by(State) %>% filter(State%in%valued_states) %>%
  summarise(T_S=sum(Sales),T_D=sum(Discount),T_P=sum(Profit),Percent=T_D*100/T_S) %>% arrange(desc(T_P))

```

Visual Representations {data-orientation=rows}
===================================== 
Row
-------------------------------------
**Tree-Map**
```{r}
treemap(superstore,index=c("Category","Sub-Category"),vSize = "Sales",vColor = "Profit",type = "value",palette = rev(viridis(6)))

```

Row
-------------------------------------
**Biggest Sale Spots**
```{r}
cities<-data.frame(S=c("California","New York","Texas","Pennsylvania"),lat=c(35.8863,43.1007,29.4128,41.2033),lng=c(-118.656,-75.2932,-94.9658,-77.1945))
cities<-setNames(cities,c("State","lat","lng"))
temp<-merge(cities,No_sales)[1:4,1:4]

map <- leaflet() %>%
  setView(lat = 36, lng = -99.7129, zoom=4) %>%
  addTiles(group="OSM") %>%
  addProviderTiles("Esri.NatGeoWorldMap") %>%
  addCircleMarkers(data =temp, ~lng, ~lat,weight = 0.5, col = 'black',fillColor = "purple", radius =~ N/100 , fillOpacity = 0.6, stroke = T, label = ~paste0(as.character(State),'  ',as.character(N)),group = 'Points')
map

```

Row
-------------------------------------
**Word Cloud- Sub Categories**
```{r}
wordcloud(words = superstore$`Sub-Category`,min.freq = 1,max.words = 100,random.order = F,rot.per = 0.35,colors = brewer.pal(8,'Set2'))

```


 Basic Visuals-1 {data-orientation=rows}
===================================== 

Row
-------------------------------------
*Number of Sales(State-Wise)*

```{r}
t<-superstore %>% group_by(State) %>% summarise(N=n(),profit=mean(Profit)) %>% arrange(desc(N))
sup<-superstore %>% filter(State%in%t$State[1:9])
ggplot(mutate(sup,State=fct_infreq(State)))+geom_bar(aes(x=State))+ggtitle("Number of Sales","State-Wise")+theme_gray()
```

Row
-------------------------------------
*Category Division*
```{r}
gd<-superstore %>% group_by(Category) %>% summarise(Sales=sum(Sales))

pct<-round(gd$Sales/sum(gd$Sales)*100)

lbls<-paste(gd$Category,pct)
lbls<-paste(lbls,"%",sep=" ")
colour<-c("#AC6D60","#85CF73","#8AC1F0")
pie(gd$Sales,labels = lbls,main = "Percentage Sales by Category",col = colour)

```

Row
-------------------------------------
*Number of Sales Region Wise*
```{r}
ggplot(mutate(superstore,Region=fct_infreq(Region)))+geom_bar(aes(x=Region))+ggtitle("Number of Sales","Region-Wise")+theme_gray()
```



 Basic Visuals-2 {data-orientation=rows}
===================================== 


*Number of sales category-wise*
```{r}

ggplot(mutate(superstore,Category=fct_infreq(Category)))+geom_bar(aes(x=Category))+ggtitle("Number of Sales","Category-Wise")+theme_gray()

```


Conclusion {data-orientation=rows}
===================================== 
**Conclusion**
We see the following Trends in the data:
States with higher sales have lower Percentage of Discounts 
States with higher Discount Rates are actually losing the Store money

**Suggestion:**
Increase Prices in States with Discount % more than 0.1
Most States with high sales don't have any discounts applied, following this the top 10 sale states should have 0 Discounts to bring in more profit.

Our most valuable Category is Office supplies in the high Sales states, thus we should increase stock and prices of this category in the same states 

California and New York are the two most important locations for this store, it would be advised to focus their attention and expand in these two location as these have the most promise.